Gamma Regression

STA6235: Modeling in Regression

Gamma Regression

  • Consider the gamma distribution, f(y|\mu, \gamma) = \frac{1}{\Gamma(\gamma) \left( \frac{\mu}{\gamma} \right)^\gamma} y^{\gamma-1} \exp\left\{ \frac{-y \gamma}{\mu} \right\}

  • where: y > 0, \mu > 0, \gamma > 0, and \Gamma(\cdot) is the Gamma function

  • This is appropriate for continuous, positive data that has a right skew.

    • I have primarily used it for complete time-to-event data

      • e.g., length of stay
  • The canonical link is the negative inverse…

    • …but the common link to use is the log.

Gamma Regression - R Syntax

  • We will use the glm() function to perform Gamma regression,
m <- glm([outcome] ~ [pred_1] + [pred_2] + ... + [pred_k], 
         data = [dataset], 
         family = Gamma(link = "log"))
  • Note the new-to-us piece of syntax: link = "log" attached to family.

Data from the Jackson Heart Study

library(tidyverse)
library(haven)
data <- read_sas("/Users/sseals/Library/CloudStorage/GoogleDrive-sseals@uwf.edu/My Drive/STA6235/Fa23/data/analysis1.sas7bdat") %>%
  select(subjid, HbA1c, age, BMI3cat) %>%
  na.omit()

Example: HbA1c in the Jackson Heart Study

data %>% ggplot(aes(x = HbA1c)) +
  geom_histogram(binwidth = 0.5, color="hotpink", fill="pink") +
  labs(y = "Count",
       x = "Hemoglobin A1c") +
  theme_bw()

Example: HbA1c in the Jackson Heart Study

  • Let’s take a basic example,
m1 <- glm(HbA1c ~ age + as.factor(BMI3cat), 
          data = data, 
          family = Gamma(link = "log"))

Call:
glm(formula = HbA1c ~ age + as.factor(BMI3cat), family = Gamma(link = "log"), 
    data = data)

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)          1.6245542  0.0168291  96.532  < 2e-16 ***
age                  0.0033199  0.0003007  11.042  < 2e-16 ***
as.factor(BMI3cat)1 -0.0635853  0.0078001  -8.152 5.55e-16 ***
as.factor(BMI3cat)2 -0.1101423  0.0108309 -10.169  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma family taken to be 0.03165908)

    Null deviance: 74.373  on 2563  degrees of freedom
Residual deviance: 66.484  on 2560  degrees of freedom
AIC: 6939

Number of Fisher Scoring iterations: 4

\ln(y) = 1.625 + 0.003 \text{ age} - 0.064 \text{ BMI}_1 - 0.110 \text{ BMI}_2

Gamma Regression – Interpretations

  • Uh oh. We are now modeling ln(y) and not y directly…

    • We need to “undo” the ln() so that we can discuss the results in the original units of y
  • We will transform the coefficients:

\begin{align*} \ln(y) &= \hat{\beta}_0 + \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 + ... \hat{\beta}_k x_k \\ y &= \exp\left\{\hat{\beta}_0 + \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 + ... \hat{\beta}_kx_k \right\} \\ y &= e^{\hat{\beta}_0} e^{\hat{\beta}_1x_1} e^{\hat{\beta}_2 x_2} \cdot \cdot \cdot e^{\hat{\beta}_k x_k} \end{align*}

  • These are multiplicative effects, as compared to the additive effects we saw under the normal distribution.

Gamma Regression – Interpretations

  • In the JHS model,

\ln(y) = 1.625 + 0.003 \text{ age} - 0.064 \text{ BMI}_1 - 0.110 \text{ BMI}_2

  • For a 1 year increase in age, the expected HbA1c is multiplied by e^{0.003}=1.003. This is a 0.3% increase.

  • For a 10 year increase in age, the expected HbA1c is multiplied by e^{0.003 \times 10}=1.030. This is a 3% increase.

  • The expected HbA1c for those intermediate health (BMI1=1, BMI2=0) is e^{-0.064}=0.938 times that of those in poor health (BMI1=0, BMI2=0). This is a ~6% decrease.
  • The expected HbA1c for those ideal health (BMI1=0, BMI2=1) is e^{-0.110}=0.896 times that of those in poor health (BMI1=0, BMI2=0). This is a ~10% decrease.

Gamma Regression – Significance of Predictors

  • What we’ve learned so far re: significance of predictors holds true with GzLM

    • Significance of individual (continuous or binary) predictors \to t-test

    • Significance of categorical (>2 categories) predictors \to ANOVA with full/reduced models

      • We need to add test = "LRT" to the anova() function.
anova(reduced, full, test = "LRT")

Gamma Regression – Significance of Predictors

  • Let’s consider a different model with the JHS data,
m2 <- glm(HbA1c ~ age + as.factor(BMI3cat) + age:as.factor(BMI3cat), 
          data = data, 
          family = Gamma(link = "log"))

Call:
glm(formula = HbA1c ~ age + as.factor(BMI3cat) + age:as.factor(BMI3cat), 
    family = Gamma(link = "log"), data = data)

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)              1.6224094  0.0229685  70.636  < 2e-16 ***
age                      0.0033598  0.0004186   8.027 1.51e-15 ***
as.factor(BMI3cat)1     -0.0737300  0.0376780  -1.957   0.0505 .  
as.factor(BMI3cat)2     -0.0769523  0.0472673  -1.628   0.1036    
age:as.factor(BMI3cat)1  0.0001814  0.0006727   0.270   0.7874    
age:as.factor(BMI3cat)2 -0.0006265  0.0008658  -0.724   0.4694    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma family taken to be 0.03167834)

    Null deviance: 74.373  on 2563  degrees of freedom
Residual deviance: 66.460  on 2558  degrees of freedom
AIC: 6942.1

Number of Fisher Scoring iterations: 4

Gamma Regression – Significance of Predictors

  • Let’s see if the interaction between age and health status is significant,
full <- glm(HbA1c ~ age + as.factor(BMI3cat) + age:as.factor(BMI3cat), data = data, family = Gamma(link = "log"))
reduced <- glm(HbA1c ~ age + as.factor(BMI3cat), data = data, family = Gamma(link = "log"))
anova(reduced, full, test = "LRT")
  • The interaction between age and health status as indicated by BMI is not significant (p = 0.677), so it should be removed from the model.

Gamma Regression – Significance of Predictors

  • Removing the interaction term,

Call:
glm(formula = HbA1c ~ age + as.factor(BMI3cat), family = Gamma(link = "log"), 
    data = data)

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)          1.6245542  0.0168291  96.532  < 2e-16 ***
age                  0.0033199  0.0003007  11.042  < 2e-16 ***
as.factor(BMI3cat)1 -0.0635853  0.0078001  -8.152 5.55e-16 ***
as.factor(BMI3cat)2 -0.1101423  0.0108309 -10.169  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma family taken to be 0.03165908)

    Null deviance: 74.373  on 2563  degrees of freedom
Residual deviance: 66.484  on 2560  degrees of freedom
AIC: 6939

Number of Fisher Scoring iterations: 4
m3 <- glm(HbA1c ~ age + as.factor(BMI3cat), 
          data = data, 
          family = Gamma(link = "log"))
summary(m3)
  • Age is a significant predictor of HbA1c (p < 0.001).

  • We need a partial F to determine if health status as defined by BMI is a significant predictor of HbA1c.

Gamma Regression – Significance of Predictors

full <- glm(HbA1c ~ age + as.factor(BMI3cat), data = data, family = Gamma(link = "log"))
reduced <- glm(HbA1c ~ age, data = data, family = Gamma(link = "log"))
anova(reduced, full, test = "LRT")
  • Thus, health status as defined by BMI is a significant predictor of HbA1c (p < 0.001).

Visualizations

  • Let’s construct a graph of the resulting model (live!).

  • What should be on the x-axis?

  • What should define the lines?

  • What do you think the “line” will look like?

# we will graph in here